Implicating effector genes at COVID-19 GWAS loci using promoter-focused Capture-C in disease-relevant immune cell types

Background SARS-CoV-2 infection results in a broad spectrum of COVID-19 disease, from mild or no symptoms to hospitalization and death. COVID-19 disease severity has been associated with some pre-existing conditions and the magnitude of the adaptive immune response to SARS-CoV-2, and a recent genome-wide association study (GWAS) of the risk of critical illness revealed a significant genetic component. To gain insight into how human genetic variation attenuates or exacerbates disease following SARS-CoV-2 infection, we implicated putatively functional COVID risk variants in the cis-regulatory landscapes of human immune cell types with established roles in disease severity and used high-resolution chromatin conformation capture to map these disease-associated elements to their effector genes. Results This functional genomic approach implicates 16 genes involved in viral replication, the interferon response, and inflammation. Several of these genes (PAXBP1, IFNAR2, OAS1, OAS3, TNFAIP8L1, GART) were differentially expressed in immune cells from patients with severe versus moderate COVID-19 disease, and we demonstrate a previously unappreciated role for GART in T cell-dependent antibody-producing B cell differentiation in a human tonsillar organoid model. Conclusions This study offers immunogenetic insight into the basis of COVID-19 disease severity and implicates new targets for therapeutics that limit SARS-CoV-2 infection and its resultant life-threatening inflammation. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02691-1.


Figure S2
Source of Capture C data for hESCs, monocytes, naïve B cells, GCB, naïve CD4+ T cells, and TFH. (A) The number of biological replicates coming from distinct genetic backgrounds and the source publication for each of the datasets used in this study. (B) Number regions identified to interact with a promoter. (C) Cumulative distribution of distances between baits and interacting regions. (D) Proportion of promoter interaction that are located within TADs or that cross TADs. (E) The proportion of promoter interactions that were called in bait-to-bait interactions.

Figure S3
UCSC browser tracks depicting chromatin accessibility (grey), promoter interactions (red) and proxy SNPs (black) at each COVID-19 GWAS locus in each cell type.

Figure S4
Top IPA gene ontology network for genes implicated by COVID-19 V2G.

Figure S5
The top IPA gene ontology network for transcription factors whose binding is predicted to be influenced by COVID-19-associated SNPs.

Figure S6
Gating strategy for sorting of naïve B cells and GCB cells for this study.
Supplementary Table Legends  Table S1 -Summary of ATAC-seq peak calls -The number and median length from the IDR optimal peak set called by the ENCODE pipeline. The percent of OCRs annotated to different genomic regions (Promoters, 5'UTRs, CDS, First introns, other introns, 3'UTRs, and intergenetic regions. Table S2 -Summary of interactions called in PCC -The number and median distance between interacting fragments. The percentage of loops within annotated TAD boundaries. Percent of interactions called between two bait fragments. Number of OCRs contacted or not contacted by a promoter interaction, and the percentage of OCRs with promoter contacts. Table S3 -COVID-19 variant-to-gene mapping (V2G) -The proxies queried for each sentinel with the corresponding proxy coordinates (hg19) R 2 value. The proxyOCR indicates the location of the proxy SNP. Cell indicates in which cell type the variant to gene call was made, proxyOCR indicates the genomic coordinates of the OCR that contains the proxy. Gene ID and gene name of the implicated genes. Frag indicates the resolution the interaction was called, b2b: whether the interaction was called in a bait to bait interaction. Score indicates the Chicago score (-log adjusted P value). Proxy2BaitDistance is the distance between the proxy and the bait. Other end frag and bait frag are the coordinates of the two ends of the interacting regions. If the interaction was a bait to bait call, oeGenes indicates which gene the bait to bait is annotated. For proxies in promoters, proxy2GeneDist indicates the distance between the proxy and TSS of the indicated gene. SNPs located in promoters have NAs for Capture C information.  Table S5 -Modeling COVID-19-associated SNP effects on TF binding -Disruption of TF binding by V2G-implicated proxies was predicted using motifBreakR with position weight matrices queried from JASPAR2018. For each PCC-established SNP-Gene pair, the motif score (scoreRef/scoreAlt) indicates TF binding site score for both reference and alternative alleles along with the associated P value (Refpvlaue/Altpvalue). Effect indicates whether the swap is expected to have a weak or strong effect on motif binding. TF_gene_name/TF_gene_id: TF binding site predicted to be affected by the proxy. TF_tpm is the mean TPM value for the TF in the given cell type.